Skip to content

Conversation

@bongwoo-bak
Copy link

@bongwoo-bak bongwoo-bak commented Sep 25, 2025

Summary

This PR introduces a new SGLang connector that supports prefill/decode (P/D) disaggregation for the LLM-D routing sidecar. It enables concurrent prefill and decode operations through SGLang’s bootstrap mechanism.

Changes

  • Added connector_sglang.go implementing P/D disaggregation
  • Integrated bootstrap configuration (host, port, room)
  • Updated cmd/llm-d-routing-sidecar/main.go and internal/proxy/proxy.go

Features

  • Room-based communication for coordinating prefill/decode
  • Configurable bootstrap via env SGLANG_BOOTSTRAP_PORT(default 8668)
  • Prefill requests are sent asynchronously, decode requests are sent synchronously and processed upon receiving the decode response

Test

  • Tested with SGLang prefill/decode services
  • Confirmed asynchronous prefill & synchronous decode execution
  • Successfully tested in Kubernetes cluster with AMD MI250 GPUs
  • Verified integration with Gateway and EPP

Copy link
Contributor

@elevran elevran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of hight level comments

  • we're moving the sidecar to llm-d-inference-scheduler repo (targeting v0.4)
  • I suspect support of additional inference server would affect additional llm-d components (perhaps require changes in IGW as well)

vLLMPort := flag.String("vllm-port", "8001", "the port vLLM is listening on")
connector := flag.String("connector", "nixlv2", "the P/D connector being used. Either nixl, nixlv2 or lmcache")
vLLMPort := flag.String("vllm-port", "8001", "the port vLLM is listening on (also used for SGLang)")
connector := flag.String("connector", "nixlv2", "the P/D connector being used. Either nixl, nixlv2, lmcache, or sglang")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connector represents to the mechanism of transferring KV between P and D instances.
Does using sglang here represent the same concept or is it more an implementation of how the sidecar should communicate with the inferencese server? Based on connector_sglang.go it seem to be similar. Can you please confirm?

if *connector != proxy.ConnectorNIXLV1 && *connector != proxy.ConnectorNIXLV2 && *connector != proxy.ConnectorLMCache {
logger.Info("Error: --connector must either be 'nixl', 'nixlv2' or 'lmcache'")
if *connector != proxy.ConnectorNIXLV1 && *connector != proxy.ConnectorNIXLV2 && *connector != proxy.ConnectorLMCache && *connector != proxy.ConnectorSGLang {
logger.Info("Error: --connector must either be 'nixl', 'nixlv2', 'lmcache', or 'sglang'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps its worthwhile to list options in an array (or map) and use that to generate L33, L56 and L57?

port := flag.String("port", "8000", "the port the sidecar is listening on")
vLLMPort := flag.String("vllm-port", "8001", "the port vLLM is listening on")
connector := flag.String("connector", "nixlv2", "the P/D connector being used. Either nixl, nixlv2 or lmcache")
vLLMPort := flag.String("vllm-port", "8001", "the port vLLM is listening on (also used for SGLang)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • consider generalizing the variable name to inferencePort?
  • similarly, change the description to a more generalized form?

may want to change the CLI flag to something more generic as well, but that would be a breaking change. We could accept a new flags (e.g., serving-port?), mark this as deprecated and allow the two to coexist for a while (e.g., prioritize use new flag in code, fallback to current if missing and log a deprecation warning).

@hhk7734
Copy link

hhk7734 commented Oct 28, 2025

@elevran Thanks for the review.
The concept of SGLangConnector is similar to NixlConnector; it coordinates PD to exchange KV-transfer information between the prefiller and decoder.

Since this repository is deprecated, I’ll close this PR for now and resubmit it to llm-d-inference-scheduler once our team has more bandwidth.

@hhk7734 hhk7734 deleted the sglang branch October 28, 2025 10:56
@moreh-dev moreh-dev closed this by deleting the head repository Oct 28, 2025
@elevran
Copy link
Contributor

elevran commented Oct 28, 2025

thanks so much.
This LGTM and support its inclusion once you have bandwidth to move it over to llm-d-inference-scheduler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants